probabilistic verification
BaB-prob: Branch and Bound with Preactivation Splitting for Probabilistic Verification of Neural Networks
Wang, Fangji, Tsiotras, Panagiotis
Branch-and-bound with preactivation splitting has been shown highly effective for deterministic verification of neural networks. In this paper, we extend this framework to the probabilistic setting. We propose BaB-prob that iteratively divides the original problem into subproblems by splitting preactivations and leverages linear bounds computed by linear bound propagation to bound the probability for each subproblem. We prove soundness and completeness of BaB-prob for feedforward-ReLU neural networks. Furthermore, we introduce the notion of uncertainty level and design two efficient strategies for preactivation splitting, yielding BaB-prob-ordered and BaB+BaBSR-prob. We evaluate BaB-prob on untrained networks, MNIST and CIFAR-10 models, respectively, and VNN-COMP 2025 benchmarks. Across these settings, our approach consistently outperforms state-of-the-art approaches in medium- to high-dimensional input problems.
Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability
Chong, Zan-Kai, Ohsaki, Hiroyuki, Ng, Bryan
This introduces a fundamental challenge in establishing computational trust, specifically how one agent can verify that another's output was genuinely produced by a claimed LLM, and not falsified or generated by a cheaper or inferior model. T o address this challenge, this paper proposes a verification framework that achieves tractable asymmetric effort, where the cost to verify a computation is substantially lower than the cost to perform it. Our approach is built upon the principle of deterministic replicability, a property inherent to autoregressive models that strictly necessitates a computationally homogeneous environment where all agents operate on identical hardware and software stacks. Within this defined context, our framework enables multiple validators to probabilistically audit small, random segments of an LLM's output and it distributes the verification workload effectively. The simulations demonstrated that targeted verification can be over 12 times faster than full regeneration, with tunable parameters to adjust the detection probability. By establishing a tractable mechanism for auditable LLM systems, our work offers a foundational layer for responsible AI and serves as a cornerstone for future research into the more complex, heterogeneous multi-agent systems.
SAVER: A Toolbox for Sampling-Based, Probabilistic Verification of Neural Networks
Sivaramakrishnan, Vignesh, Kalagarla, Krishna C., Devonport, Rosalyn, Pilipovsky, Joshua, Tsiotras, Panagiotis, Oishi, Meeko
We present a neural network verification toolbox to 1) assess the probability of satisfaction of a constraint, and 2) synthesize a set expansion factor to achieve the probability of satisfaction. Specifically, the tool box establishes with a user-specified level of confidence whether the output of the neural network for a given input distribution is likely to be contained within a given set. Should the tool determine that the given set cannot satisfy the likelihood constraint, the tool also implements an approach outlined in this paper to alter the constraint set to ensure that the user-defined satisfaction probability is achieved. The toolbox is comprised of sampling-based approaches which exploit the properties of signed distance function to define set containment.
Probabilistic Verification of Neural Networks using Branch and Bound
Boetius, David, Leue, Stefan, Sutter, Tobias
Probabilistic verification of neural networks is concerned with formally analysing the output distribution of a neural network under a probability distribution of the inputs. Examples of probabilistic verification include verifying the demographic parity fairness notion or quantifying the safety of a neural network. We present a new algorithm for the probabilistic verification of neural networks based on an algorithm for computing and iteratively refining lower and upper bounds on probabilities over the outputs of a neural network. By applying state-of-the-art bound propagation and branch and bound techniques from non-probabilistic neural network verification, our algorithm significantly outpaces existing probabilistic verification algorithms, reducing solving times for various benchmarks from the literature from tens of minutes to tens of seconds. Furthermore, our algorithm compares favourably even to dedicated algorithms for restricted subsets of probabilistic verification. We complement our empirical evaluation with a theoretical analysis, proving that our algorithm is sound and, under mildly restrictive conditions, also complete when using a suitable set of heuristics.
A Unified Framework for Probabilistic Verification of AI Systems via Weighted Model Integration
Morettin, Paolo, Passerini, Andrea, Sebastiani, Roberto
However, the complexity and versatility of modern AI systems calls for a unified framework to assess their trustworthiness, which cannot The probabilistic formal verification (PFV) of be captured by a single evaluation metric or formal property. AI systems is in its infancy. So far, approaches This papers aims to introduce such a framework. We have been limited to ad-hoc algorithms for specific show how by leveraging the Weighted Model Integration classes of models and/or properties. We propose (WMI) [Belle et al., 2015] formalism, it is possible to devise a unifying framework for the PFV of AI systems a unified formulation for the probabilistic verification of based on Weighted Model Integration (WMI), combinatorial AI systems. Broadly speaking, WMI is the which allows to frame the problem in very general task of computing probabilities of arbitrary combinations terms. Crucially, this reduction enables the verification of logical and algebraic constraints given a structured joint of many properties of interest, like fairness, distribution over both continuous and discrete variables.
Elliptical Slice Sampling for Probabilistic Verification of Stochastic Systems with Signal Temporal Logic Specifications
Scher, Guy, Sadraddini, Sadra, Tedrake, Russ, Kress-Gazit, Hadas
Autonomous robots typically incorporate complex sensors in their decision-making and control loops. These sensors, such as cameras and Lidars, have imperfections in their sensing and are influenced by environmental conditions. In this paper, we present a method for probabilistic verification of linearizable systems with Gaussian and Gaussian mixture noise models (e.g. from perception modules, machine learning components). We compute the probabilities of task satisfaction under Signal Temporal Logic (STL) specifications, using its robustness semantics, with a Markov Chain Monte-Carlo slice sampler. As opposed to other techniques, our method avoids over-approximations and double-counting of failure events. Central to our approach is a method for efficient and rejection-free sampling of signals from a Gaussian distribution such that satisfy or violate a given STL formula. We show illustrative examples from applications in robot motion planning.
Probabilistic Verification of Neural Networks Against Group Fairness
Sun, Bing, Sun, Jun, Dai, Ting, Zhang, Lijun
Fairness is crucial for neural networks which are used in applications with important societal implication. Recently, there have been multiple attempts on improving fairness of neural networks, with a focus on fairness testing (e.g., generating individual discriminatory instances) and fairness training (e.g., enhancing fairness through augmented training). In this work, we propose an approach to formally verify neural networks against fairness, with a focus on independence-based fairness such as group fairness. Our method is built upon an approach for learning Markov Chains from a user-provided neural network (i.e., a feed-forward neural network or a recurrent neural network) which is guaranteed to facilitate sound analysis. The learned Markov Chain not only allows us to verify (with Probably Approximate Correctness guarantee) whether the neural network is fair or not, but also facilities sensitivity analysis which helps to understand why fairness is violated. We demonstrate that with our analysis results, the neural weights can be optimized to improve fairness. Our approach has been evaluated with multiple models trained on benchmark datasets and the experiment results show that our approach is effective and efficient.
Probabilistic Verification for Cognitive Models
Junges, Sebastian (RWTH Aachen University) | Jansen, Nils (University of Texas at Austin) | Katoen, Joost-Pieter (RWTH Aachen University) | Topcu, Ufuk (University of Texas at Austin)
Many robotics applications and scenarios that involve interaction with humans are safety or performance critical. A natural path to assessing such notions is to include a cognitive model describing typical human behaviors into a larger modeling context. In this work, we set out to investigate a combination of such a model with formal verification. We present a general and flexible framework utilizing methods from probabilistic model checking and discuss current pitfalls. We start from information about typical behavior, obtained from generalizing specific scenarios by the usage of inverse reinforcement learning. We translate this information in order to define a formal model exhibiting stochastic behavior (whenever significant data is present) or nondeterminism (if the model is underspecified or no significant data is present) that can be analyzed. This model for a human can be combined with a robot model by using standard parallel composition. The benefit is manyfold: First, safe or optimal strategies for involved robots regarding a human can be synthesized depending on the given model. In general, verification can determine if such benign strategies are even possible. Furthermore, the cognitive model itself can be analyzed with respect to possible unnatural behaviors; thereby feedback to developers of such models is provided. We evaluate and describe our approaches by means of a well-known model for visiomotor tasks and provide a framework that can readily incorporate other models.